SAARSHEFF at SemEval-2016 Task 1: Semantic Textual Similarity with Machine Translation Evaluation Metrics and (eXtreme) Boosted Tree Ensembles
نویسندگان
چکیده
This paper describes the SAARSHEFF systems that participated in the English Semantic Textual Similarity (STS) task in SemEval2016. We extend the work on using machine translation (MT) metrics in the STS task by automatically annotating the STS datasets with a variety of MT scores for each pair of text snippets in the STS datasets. We trained our systems using boosted tree ensembles and achieved competitive results that outperforms he median Pearson correlation scores from all participating systems.
منابع مشابه
UoW: NLP techniques developed at the University of Wolverhampton for Semantic Similarity and Textual Entailment
This paper presents the system submitted by University of Wolverhampton for SemEval-2014 task 1. We proposed a machine learning approach which is based on features extracted using Typed Dependencies, Paraphrasing, Machine Translation evaluation metrics, Quality Estimation metrics and Corpus Pattern Analysis. Our system performed satisfactorily and obtained 0.711 Pearson correlation for the sema...
متن کاملFBK: Machine Translation Evaluation and Word Similarity metrics for Semantic Textual Similarity
This paper describes the participation of FBK in the Semantic Textual Similarity (STS) task organized within Semeval 2012. Our approach explores lexical, syntactic and semantic machine translation evaluation metrics combined with distributional and knowledgebased word similarity metrics. Our best model achieves 60.77% correlation with human judgements (Mean score) and ranked 20 out of 88 submit...
متن کاملUSAAR-SHEFFIELD: Semantic Textual Similarity with Deep Regression and Machine Translation Evaluation Metrics
This paper describes the USAARSHEFFIELD systems that participated in the Semantic Textual Similarity (STS) English task of SemEval-2015. We extend the work on using machine translation evaluation metrics in the STS task. Different from previous approaches, we regard the metrics’ robustness across different text types and conflate the training data across different subcorpora. In addition, we in...
متن کاملFBK HLT-MT at SemEval-2016 Task 1: Cross-lingual Semantic Similarity Measurement Using Quality Estimation Features and Compositional Bilingual Word Embeddings
This paper describes the system by FBK HLTMT for cross-lingual semantic textual similarity measurement. Our approach is based on supervised regression with an ensemble decision tree. In order to assign a semantic similarity score to an input sentence pair, the model combines features collected by state-of-the-art methods in machine translation quality estimation and distance metrics between cro...
متن کاملUPC-CORE: What Can Machine Translation Evaluation Metrics and Wikipedia Do for Estimating Semantic Textual Similarity?
In this paper we discuss our participation to the 2013 Semeval Semantic Textual Similarity task. Our core features include (i) a set of metrics borrowed from automatic machine translation, originally intended to evaluate automatic against reference translations and (ii) an instance of explicit semantic analysis, built upon opening paragraphs of Wikipedia 2010 articles. Our similarity estimator ...
متن کامل